© 



J 



Europ&lsches Patentamt 
European Patent Office 
Office europ6en des brevets 



© Publication number: 



in 



0 660 249 A1 



© 



EUROPEAN PATENT APPLICATION 



0 Application number: 94309095.1 
© Date of filing: 07.12.94 



© int. CI. 6 : G06F 17/30 





rrionty: i^.yo uo i7oo4o 


d iveiin LfOun 


© 




Jackson, 


Date of publication of application: 


New Jersey 08527 (US) 




28.06.95 Bulletin 95/26 


Inventor: Robinson, Bethany S. 


® 


Designated Contracting States: 


40 Tulip Lane 
Colts Neck, 




DE FR GB IT 


New Jersey 07722 (US) 


© 


Applicant: AT & T CORP. 






32 Avenue of the Americas 
New York, NY 10013-2412 (US) 


© Representative: Buckley, Christopher Simon 
Thirsk et al 




Inventor: Gabbe, John D. 
14 Laurelwood Drive 


AT&T (UK) LTD., 

AT&T Intellectual Property Division, 




Little Silver, 


5 Mornington Road 




New Jersey 07739 (US) 


Woodford Green, 




Inventor: Ginsberg, Allen 


Essex IG8 0TU (GB) 



© Table of contents Indexing system. 



< 



to 



Q- 
UJ 



© The present invention is a method of operating a 
data processor, or plurality of data processors, to 
index data for one or more users involving the steps 
of receiving primary data, delineating episode data 
from the primary data, generating media representa- 
tions to correspond with the episode data, generat- 
ing a table of contents from the episode data and 
media representations in real time, and storing 
and/or presenting the table of contents on a graphi- 
cal user interface. Primary data may constitute audio 
data, video data, text data, combinations of these as 
well as any medium-generated data which may be in 
or converted to digital form. Episode data is gen- 
erated on the basis of significant video images de- 
termined by scene changes and other information. In 
addition, episode data may be based on a significant 
audio change, delineated from the primary data, e.g. 
based on speaker changes and other information. 
Media representations are iconic images, symbolic 
representations or graphical images which refer to 
episode data to ultimately aid the user in making 
decisions from reviewing the real time table of con- 
tents. 



FI6. 7 





508 

jjHBU* | 



J 



510 



Hank Xerox (UK) Business Services 

(3.10/3.09/3.3.4* 



EP 0 660 249 A1 



FIELD OF THE INVENTION 

The present invention relates generally to 
methods for operating a processor to index data. 
More specifically, one aspect of the present inven- 
tion relates to a method for real time generating of 
an iconic table of contents from raw audio data, 
video data, or other data. 

INFORMATION DISCLOSURE STATEMENT 

Generally, systems which allow a user to man- 
ually index digital video and audio data by textual 
annotation are known in the prior art. Systems 
which allow playback of digital video and audio 
data concerning a conference while recording said 
conference are also known in the prior art. 

"Where Were We: Making and Using Near- 
synchronous, Prenarrative Video," Scott L. Min- 
neman and Steve R. Harrison, (Minneman) was 
published in Multimedia 1993 , and discloses man- 
ual indexing of video and audio information through 
the use of complex computer hardware with high 
memory capacity. The Minneman system is one in 
which the user may manually select a video image 
and manually annotate the video image with text 
for subsequent use. Minneman discloses no intel- 
ligent agent for filtering significant occurrences on 
the video data. 

"A Magnifier Tool for Video Data" (Mills) was 
written by Michael Mills, Jonathan Cohen and Yin 
Yin Wong and was published in the May 3-7, 1992 
issue of the Proceedings of Computer Human Inter- 
action on pages 93-98. Mills discloses a simplistic 
temporal index of video data which does not iden- 
tify significant occurrences in the content of the 
video data but merely indexes video data by 
digitizing still frame samples at regular time inter- 
vals and does not appear to operate in real time 
because when the "Magnifier" program is initially 
entered six frames spanning the segments of an 
entire 30 minute presentation are illustrated. 

"CECED: A System For informal Multimedia 
Collaboration" was written by Earl Craighill, Ruth 
Lang, Martin Fong, and Keith Skinner and pub- 
lished in Proceedings of the Association for Com- 
puting Machinery Multimedia of 1993 . The Craighill 
publication discloses a "Process Manager which 
automatically collects, stores, and replays design 
traces that were generated during a single user or 
multiparty design session and only allows the user 
to select and annotate significant occurrences in 
the process trace after playback. 

"Video Conferencing File Storage and Manage- 
ment in Multimedia Computer Systems" was writ- 
ten by P. Venkat Rangan in Computer Networks 
and ISDN Systems , March 1993, pp. 901-919. The 
Rangan article discloses a video conferencing sys- 



tem which allows the user to playback a recorded 
conference while the real time conference is con- 
tinuing. The Rangan article discloses a Video File 
Server that allows a user-participant in a confer- 

s ence to manually record the proceedings of a con- 
ference into a document and the Video File Server 
stores the video associated with the document in a 
textual file. However, the Video File Server does 
not automatically filter important occurrences and 

io does not create a table of contents or index based 
thereon. 

SUMMARY OF THE INVENTION 

75 The present invention is a method of operating 
a data processor, or plurality of data processors, to 
index data for one or more users involving the 
steps of receiving primary data, delineating epi- 
sode data from the primary data, generating media 

20 representations to correspond with the episode 
data, generating a table of contents from the epi- 
sode data and media representations in real time, 
and storing and/or presenting the table of contents 
on a graphical user interface. Primary data may 

25 constitute audio data, video data, text data, com- 
binations of these as well as any medium-gen- 
erated data which may be in or converted to digital 
form. Episode data is generated on the basis of 
significant video images determined by scene 

30 changes and other information. In addition, episode 
data may be based on a significant audio change, 
delineated from the primary data, e.g. based on 
speaker changes and other information. Media re- 
presentations are iconic images, symbolic repre- 

35 sentations or graphical images which refer to epi- 
sode data to ultimately aid the user in making 
decisions from reviewing the real time table of 
contents. 

40 BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention may be understood by 
reference to the following description considered in 
conjunction with the following accompanying draw- 
45 ings, in which like reference numerals identify like 
elements, as follows: 

Figure 1 shows one possible hardware configu- 
ration for implementing the present invention. 
Figure 2 shows the main steps involved in the 
so method for operating a processor to index data 
according to the present invention. 
Figure 3 is a flowchart depicting the data flow 
regarding the step of delineating episode data 
according to the present invention. 
55 Figure 4 is a flowchart depicting the step of 
filtering event data according to the present in- 
vention. 
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Figure 5 is a diagram illustrating the step of 
presenting the table of contents by showing one 
possible screen configuration on the video dis- 
play device according to the present invention. 
Figure 6 shows an alternative embodiment of 
the present invention showing information flow in 
a client server network. 

Figure 7 shows an alternative embodiment of 
the present invention showing the information 
flow associated with the means for storing data 
of the episode server. 

Figure 8 shows an alternative embodiment of 
the present invention showing the information 
flow associated with the means for retrieving 
data of the episode server. 

DETAILED DESCRIPTION OF THE PRESENT IN- 
VENTION 

The present invention is directed to providing a 
method for automatically indexing data such that a 
user will not be required to peruse the contents of 
the data and/or manually index data. The method 
includes real time generation of a table of contents 
of video, textual, audio, and other available informa- 
tion which is analogous to the table of contents in a 
book. The method of the invention will improve the 
utility of information storage and retrieval systems 
by associating ancillary data which can be under- 
stood by processing systems, with multimedia data 
which is presently incomprehensible to processing 
systems. 

The present invention method is also used to 
facilitate interactions at conferences by allowing 
user participants to more conveniently find and 
replay important segments recorded during the 
conferences for emphasis,, to enhance comprehen- 
sion, or to otherwise instantly or subsequently re- 
view portions or all of the recordation selectively. 

The present invention method, in addition to 
relying upon episode data for its table of contents, 
also automatically generates a media representa- 
tion or icon which is a miniature graphical image of 
the video data such that differences of semantic 
interpretations are minimized. 

The present invention method relies on the 
reliable and dependable response of a processor 
programmed to generate media representations 
which always correspond to the appropriate audio 
or video events. Moreover, the present embodi- 
ment permits semi-automatic annotation of video 
by allowing the user to merely highlight or block 
text to be annotated to video or audio via the 
graphical user interface. 

In addition, the present invention overcomes 
the problem of an index containing spurious in- 
formation from the data or missing important in- 
formation from the data by employing an intelligent 



filtering agent to identify significant occurrences. 
The present invention overcomes the problem of 
insufficient throughput of processor systems to real 
time index, as distinguished from mere real time 
5 recording, video data. 

Generally, the present invention performs the 
main steps of delineating episode data, generating 
media representations, generating a table of con- 
tents in real time, and presenting a table of con- 
to tents by efficiently filtering input to the first 
processor(s) by delineating significant video data, 
audio data, event data, or metadata which is to be 
indexed. 

The present invention organizes primary data 

is in real time to make the information conveniently 
accessible through a graphical user interface. Real 
time is defined as a data processor system which 
processes input data immediately upon receipt 
such that any minimal delay of output data is 

20 immaterial to a human user or substantially im- 
perceptible to a human user. Primary data encom- 
passes audio-data, video-data, text data, combina- 
tions thereof, and any other form of data that can 
be in or converted to digital data. The present 

25 invention has a myriad of applications including 
facilitating conferences between users at remote 
locations, enhancing interaction between engineers 
during the design process, replaying news from 
live broadcasts, replaying political debates from 

30 live broadcasts, and previewing movies by viewing 
selected segments. 

HARDWARE REQUIREMENTS 

35 The hardware requirements for the present in- 
vention will vary depending upon whether the 
present invention is used to interpret primary data 
which is solely audio-data, solely visual-data, or 
audiovisual data. The hardware requirements will 

40 also vary depending upon the quantity of primary 
data and the desired quality of the presentation on 
the graphical user interface and audio interface. 
Generally, the present invention should be imple- 
mented by using a general purpose computer such 

45 as the Sun SPARC 10 with multiple processors. 
Alternatively, the present system can be imple- 
mented upon several general purpose computers in 
cooperative communication. The system can op- 
erate on a general computer having a single pro- 

50 cessor only where the quantity and quality of the 
primary data so permits. In an alternate embodi- 
ment, the system can also operate on computer 
architecture which supports a client-server network. 
For example, Fig. 6, discussed below, illustrates 

55 the information flow of the indexing system in a 
client-server network. 

If the primary data contains video-data or 
audiovisual data, then the general computer must 
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have sufficient memory, buffers, and processor ca- 
pacity to manipulate a continuous input of video 
bitmaps, while reserving processor time and mem- 
ory for other tasks in a windows system operating 
environment. The minimum hardware and proces- 
sor requirements tend to increase proportionately 
in response to enhanced display resolution, in- 
creased number of colors featured on the display, 
increased duration of primary data to be pro- 
cessed, and concurrent processing of different 
types of primary data. Hence, the system's require- 
ments for processors, memory buffers, random ac- 
cess memory (RAM), and storage disks may ne- 
cessitate the upgrading of the general purpose 
computer with commercially available components. 
For example, commercially available disks with 
high capacities ranging in the gigabytes may be 
used to store digital primary data or derivations 
thereof. A 1.0 gigabyte disk is sufficient to store 
from 60 to 75 minutes of audiovisual data com- 
pressed in the JPEG format. Compression of audio 
and video data may be implemented by known 
software programs, or preferably by known firm- 
ware. 

Fig. 1 illustrates an example of one possible 
hardware configuration for the present invention. 
The individual elements of the hardware disclosed 
configuration of Fig. 1 are well-known in the prior 
art. In addition, the elements in Fig. 1 are com- 
bined in a multiprocessor configuration which is 
well-known in the prior art. The hardware configura- 
tion of Fig. 1 is illustrated to emphasize the impor- 
tance of selecting a hardware configuration to im- 
plement the present invention which is capable of 
real time processing continuous inputs of digital 
video and audio data. The hardware configuration 
of Fig. 1 will also facilitate the subsequent descrip- 
tion of data manipulation in the detailed description 
of the present invention. 

The first processor 10 can address memory 
locations in the first local random access memory 
16, the first input memory buffer 18, the first rom 
14, and the global random access memory 44. The 
direct memory access co-processor 13 can transfer 
data between global ram memory 44, first local 
random access memory (first local RAM) 16, or the 
database 26 and any input/output device located 
on the first local databus 38 without direct involve- 
ment of the first processor 10. Meanwhile, the 
second processor 12 can address memory loca- 
tions in the second local random access memory 
50, the second read only memory 52, and the 
global random access memory 44. Thus, the first 
processor 10 and the second processor 12 ex- 
change data via global ram 44 under the supervi- 
sion of the first bus controller 40 and the second 
bus controller 46. In sum, the system is configured 
such that the first processor 10 and the second 



processor 12 can simultaneously process data in- 
dependently and such that the first processor 10 
and the second processor 12 can share input and 
output data via the global ram 44. 

5 Other improvements of the system are also 

adapted to process large amounts of data. For 
example, the graphics display controller 30 may 
contain a video processor and supplemental ran- 
dom access memory to increase the processing 

to rate of video images. The hardware is configured 
for a graphical user interface 34 partially by the 
input/output ports 44 for the keyboard and mouse. 
The graphical user interface 34 includes the visual 
display device, a keyboard, and a pointing device 

75 such as a mouse. Select and other forms of the 
word used throughout the specification, where the 
context permits, refer to the user input of choosing 
an alternative presented on a display device by 
using a pointing device or keyboard. 

20 The hardware configuration also features an 
impedance bridge 22 for matching the respective 
impedances of the first input memory buffer 18 and 
the second input memory buffer 20 to the imped- 
ance of the data source 36. The first input memory 

25 buffer 18 and second input memory buffer 20 allow 
the first processor 10 or the direct memory access 
coprocessor 13 to retrieve data from the first input 
memory buffer 18 while the disk controller 24, with 
a supplementary processor, retrieves primary data 

30 102 from the data source 36 via the second input 
memory buffer 20. The disk controller 24 stores the 
primary data 102 from the second input memory 
buffer 20 and derivations from the primary data 
102. The database 26 must have a high storage 

35 capacity determined on the basis of the quantity 
and quality of the primary data 102 to be stored. 
The information stored in the database 26 is prefer- 
ably stored in the form of objects. Computer lan- 
guages such as Smalltalk of the Xerox Pal Alto 

40 Research Center allow the programmer to define 
objects. Objects may constitute tasks and pro- 
cesses or data elements and other representations 
of data. The programmer defines the way each 
object can communicate with other objects such 

45 that communication between object may be ex- 
ecuted nonsequentially. The communication be- 
tween objects allows processes to be executed. 

DEFINITIONS FOR TYPES OF DATA 

50 

Referring to Fig. 2, the present invention ma- 
nipulates sundry types of data including primary 
data 102, episode data 104, event data 106, 
metadata 108, and media representations 202. Pri- 
55 mary data 102 encompasses audio-data, video- 
data, text-data, and audiovisual data. Primary data 
102 must be in a digital format before any process- 
ing can occur. For example, video primary data 
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102 can be converted from NTSC and format to 
JPEG digital format using commercially available 
equipment. Text data may be in ASCII format. 

Episode data 104 includes a significant video 
image delineated from the primary data 102 based 
on significant scene changes, simultaneous 
changes in video and audio, or an audio recording 
representing a significant change in speakers. Epi- 
sode data 104 also includes significant audio ex- 
cerpts delineated from the primary data 102 based 
on speaker changes and other factors. 

A media representation 202 may be a min- 
iature graphical image of episode data 104 when 
the primary data is video-data. A media representa- 
tion 202 may be a graphical image of the person 
speaking when the episode data 104 is solely 
audio-data. A media representation 202 may be a 
graphical, symbolic image representing notes from 
an application word processing program. 

A media representation 202 is an icon dis- 
played on the graphical user interface 34 in a 
window. A window refers to a work area or display 
area in a graphical user interface that respond to 
distinct user input. Windows are displayed through 
a windowing program 110 such as those known in 
the art or created by the able artisan. The media 
representation will differ depending upon whether 
the media representation was derived from an epi- 
sode data 104, an event data 106, or a metadata 
108. If the episode data 104 is video data, then the 
media representations 202 derived from episode 
data are a miniature graphical images. On the other 
hand, if the episode data 104 is audio information 
then, a media representation 202 derived from epi- 
sode data 104 is a graphical, symbolic image of 
the person speaking when the primary data 102 is 
solely video-data. 

A media representation 202 derived from event 
data 106 or metadata 108 can take virtually any 
form on the graphical user interface 34 or the audio 
interface 54. Event data 106 is a record created by 
the windowing program 110 in response to the 
user's input 1 1 1 or network input 111, the applica- 
tion programs 109, or external input. For example, 
event data 106 is generated in response to user 
input 111 when the user presses or releases a 
keyboard key, holds down a keyboard key, ac- 
tivates a window, updates a window, or presses a 
mouse button. Event data 106 may also be gen- 
erated in response to the external input in the form 
of arrival of a predefined string at a serial or 
parallel port of the general purpose computer. 
Metadata 108 is control data generated by the 
application program 109, the real time indexing 
program 112 or the basic operating system such 
as UNIX. 

Occurrences refer to the content of any type of 
data which is bounded by temporal constraints. 



The first plurality of knowledge representations 
and the second plurality of knowledge representa- 
tions contain instances of facts, knowledge and 
relationships between data elements which assist 

5 the processor in properly evaluating primary data 
102, episode data 104, and media representations 
202. The first plurality of knowledge representa- 
tions may be expressed in the form of semantic 
networks, rule-based systems or frames. Semantic 

w networks are an assembly of nodes which repre- 
sent concepts, objects or data elements and links 
which characterize the relationship between the 
nodes. The thesaurus knowledge representation 
scheme is an example of semantic network. Object 

75 oriented programs are well suited for semantic 
networks because of the need for object oriented 
programs to inherit characteristics of connected 
nodes. The preferred embodiment of the invention 
utilizes a semantic network coupled with object 

20 oriented programming. However, the present inven- 
tion may also be implemented using a rule-based 
system for knowledge representation which con- 
sists of a series of if-then statements. 

Referring to Fig. 2, the main steps of the 

25 preferred embodiment are delineating episode data 
100, generating media representations 200, real 
time generating of a table of contents 300 and 
presenting a table of contents 400. 

30 DELINEATING EPISODE DATA 

Delineating episode data 100 from primary 
data 102 differs depending upon whether audio- 
data, video-data, or audiovisual data is being pro- 

35 cessed. The delineating episode data step in block 
100 is based upon executing procedures in 
cooperation with the plurality of first knowledge 
representations. The plurality of first knowledge 
representations is stored in the first local random 

40 access memory 16 and the global random access 
memory 44. The plurality of first knowledge repre- 
sentations contains relationships and may contain 
procedures which relate each episode data 104 to 
the primary data 102. The plurality of first knowl- 

45 edge representations enables the processor, or the 
first processor 10 and the second processor 12, to 
distinguish significant scene changes with respect 
to video data and speaker changes with respect to 
audio data. In the preferred embodiment, speaker 

so changes are identified by known voice operated 
relay (VOR) circuitry used in conjunction with 
known digital logic circuitry to determine the princi- 
pal floor speaker. 

If the primary data 102 includes video-data, 

55 then the plurality of first knowledge representations 
are applied to the primary data 102 through blocks 
120, 140, 160, 180 and others as illustrated in Fig. 
3. If the primary data 102 includes audio-data or 
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audiovisual data, then the plurality of first knowl- 
edge representations are applied through different 
routines. 

Audio data is flagged as potential episode data 
104 when a speaker change occurs. Additional 
filtering compares the potential audio episode data 
with prior audio episode data for a fixed time 
interval. If the audio spectral frequency compo- 
nents of the potential episode data deviate from the 
prior episode data by a fixed amount, then the 
potential audio episode data is bonafide episode 
data and is flagged as episode data 104. 

With respect to the video data, the plurality of 
first knowledge representations contains informa- 
tion, and electively contains procedures, corre- 
sponding to the luminance values of any bitmaps 
which allow the processor, or the first processor 10 
and the second processor 12, to distinguish scene 
changes. The plurality of first knowledge repre- 
sentations also contains information and possibly 
procedures to further filter the scene changes cho- 
sen according to luminance such that only signifi- 
cant scene changes are flagged as episode data 
104. Fig. 3 shows how the plurality of first knowl- 
edge representations is applied to select episode 
data 104 from video-data as primary data 102. The 
registers, counters, and pointers referenced in Fig. 
3 correspond to the following physical elements of 
computer hardware: the similar counter in block 
116 and most counter in block 118 may be general 
purpose registers in the first processor 10 or the 
second processor 12. On the other hand, the simi- 
lar and most counters may be located in the first 
local random access memory 16, the global ran- 
dom access memory 44, or the second local ran- 
dom access memory 50. The M pointer in block 
118 and N pointer in block 113 are general pur- 
pose registers or a stack pointer. 

As an alternative to delineating episode data 
100 pursuant to Fig. 3, various known scene 
change algorithms can be utilized, which may be 
coupled with additional intelligent filtering as set 
forth in block 180 of Fig. 3. 

In Fig. 3, the blocks illustrate the step of delin- 
eating episode data 100 pursuant to the preferred 
embodiment of the present invention. First, respec- 
tive words M from bitmap N and corresponding 
words M from bitmap N + C are read into the data 
registers in block 114. As mentioned previously, 
the data registers are general purpose registers of 
the first processor 10 or second processor 12 for 
the temporary storage of data. C represents any 
positive integer less than the number of bitmaps 
received into the first input memory buffer 18. 
Second, in block 120, the data registers of block 
114 are compared to see if the luminance char- 
acteristics of each words M are similar. If the 
luminance characteristics of the respective words 



M from bitmap N and the corresponding words M 
from bitmap N + C deviate from one another by 
approximately less than twenty percent (20%), then 
the respective words M from bitmap N and the 

s corresponding words M from bitmap N + C are 
similar. Third, the similar counter in block 116 is 
incremented if the respective words M from bitmap 
N are sufficiently simitar to the corresponding 
words from bitmap N + C. Fourth, the steps in 

io blocks 114 and 120 are repeated via block 140 
until a sufficient sample of the bitmap N and bit- 
map N + C are compared. A sufficient sample of 
the bitmap is bounded by allowable maximum de- 
gree of error in calculating the similarity between 

75 bitmap N and N + C at one extreme and available 
processing time of the first processor 10 and the 
second processor 12 at the other extreme. To 
conserve available processing time, the sufficient 
sample can be, but need not be, limited to a 

20 maximum of seventy percent (70%) of the words 
comprising bitmap N and bitmap N + C. Fifth, 
once a sufficient sample of the bitmap N and the 
bitmap N + C have been compared, then the 
block 160 determines whether a scene change has 

25 occurred by evaluating the value of the similar 
counter. If the bitmaps N and the bitmaps N + C 
are similar as determined by a minimum threshold 
value in the similar counter, then the similar coun- 
ter is cleared, the most counter is cleared and the 

30 M pointer is set to 1 and the N pointer is incre- 
mented by C. The minimum threshold value in the 
similar counter correlates to the luminance char- 
acteristics of bitmap N and bitmap N + C deviating 
by less than twenty percent (20%). If the bitmaps 

35 N and bitmaps N + C are not similar, then the 
bitmaps are evaluated according to the block 180. 
Block 180 determines which video scene changes 
are most significant by evaluating the video scene 
changes in the light of simultaneous audio changes 

40 and video changes, changes in the colors of the 
bitmaps, and additional comparisons of bitmaps 
previously identified as different by block 180. If 
the bitmap N + C is sufficiently different from all 
bitmaps in a fixed prior time interval according to 

45 block 180, then bitmap N + C is flagged as 
episode data in block 115. Finally, in block 1 17 the 
episode data 104 is stored in the first local random 
access memory 16, local random access memory 
50, or global random access memory 44 where the 

so applications program 109 can access the episode 
data 104 for further tasks. 

FILTERING EVENT DATA 

55 Trivial event data must be distinguished from 
significant event data 106 (see Fig. 2). Media re- 
presentations 202 will only be generated corre- 
sponding to significant event data 106 and signifi- 
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cant metadata 108. For example, a media repre- 
sentation 202 should not be generated for every 
press of the mouse button on the graphical user 
interface 34. However, the user may want to gen- 
erate a media representation 202 when a certain 
designated time has elapsed from when the user 
started his presentation. 

One means of filtering event data to obtain 
significant event data 106 is disclosed in Fig. 4. In 
this embodiment, event data 106 is intercepted in a 
block 230. A determination of whether this data 
represents a request to create a new client window 
(i.e. a main working window of application) is made 
in block 211. If it does, then a copy of the request 
that created this window is saved in block 212. 
Block 212 creates an entry in a Client_rec_jd 
vector for this window, and sets it equal to zero. 
The Client_rec_Jd vector is an array window id. 
Each window has a unique id which is used to 
index into this vector. The id number of the last 
archived record is stored for each window in these 
locations. Next, a block 213 determines whether 
the application sends text display requests by us- 
ing actual characters, pixmaps, or both. If it uses 
pixmaps only, than the application returns to start. 
If the application uses characters or a combination 
of characters and pixmaps, then block 214 al- 
locates a 2D-array. This array keeps track of the 
text that appears in this window and the location of 
the characters. The 2D-array functions when a re- 
quest is made to display character "C n at location 
X,Y in the window, the type of font requested will 
always be known. Therefore, the number of pixels 
the display C requires and the number cells (n) in 
the 2D-array C needs to maintain the correspon- 
dence with the real window will also be known. 
When C is entered at X,Y in the array, the 2D-array 
cells - X+1, Y...X +(n-1), Y ~ will be tagged with 
a special non-displayable character or code. Com- 
paring a string S with the contents of the 2D-array 
begins at a starting location in the array which is 
given by the display request. The first character of 
the 2D-array is compared to the first character in S. 
If they are equal, the font sizes are compared. The 
font sizes of S are known, and the font sizes of the 
2D-array are determined by counting the number 
of special characters that follow the given char- 
acter. 

Following the allocation of the 2D array, a block 

215 will set Text Change Counter variable to 

zero for this window. The Text_Change Counter 

is a local variable for each window which keeps 
track of the number of characters added/changed 
since the last text was archived for this window. 

If event data 106 does not represent a request 
to create a new client window, then a determination 
is made in a block 216 to display one or more 
characters of text in an existing client window. If 



the determination is affirmative, than a block 217 
determines the correspondence between locations 
referred to in the display request and locations and 
locations in the window's 2D array given the known 

5 properties of the font requested by the application. 
Following this determination, block 218 compares 
the contents (if any) of the corresponding locations 
in the 2D array with the contents to be displayed in 
the corresponding window locations referenced in 

ro the display request. A decision is made whether 
the two contents are identical in block 219. If they 
are, then the application returns to start. If they are 
not, then a block 220 updates the 2D array to 
reflect the new text. Following this update, a block 

75 221 adds the number of characters in the update to 
this window's Text__Change__Counter. A decision 
is then made in block 222 whether 
Text__Change_ Counter for this window exceeds 
Text_Jndex__ Threshold. Text_Jndex__Threshold 

20 refers to an integer which specifies the number of 
characters that must be added or changed in a 
window before indexing takes place. If 

Text Change Counter is not excessive, then the 

application returns to start. However, if it is, then a 

25 block 223 archives the contents of the 2D array, 
indexes them, and creates a request for this win- 
dow as a field in the archived record. If the applica- 
tion includes other displays besides text (graphics 
or pixmaps), block 223 sends a pseudo redisplay 

30 message (a detailed description of the pseudo re- 
display message request is given below). It also 
captures the window's resulting display requests 
and includes them in the archived record, and 
includes a list of current records-ids for all mem- 

35 bers of Client rec id vector as a field in the 

archived record. Block 223 then changes the entry 
for this window in Clients_rec_jd to the id of this 
archived record. Finally, block 223 sets 
Text_change_counter to zero for this window. 

40 If block 216 determines not to display one or 
more characters of text in an existing client win- 
dow, then a block 224 determines whether the 
event is a request to destroy an existing client 
window. If so, then a block 225 determines whether 

45 this window has a 2D array with a positive 
Text__Change_Counter. If so, then the information 
is managed by block 223 as described above. If 
not, the application returns to start. 

If block 224 determines that this event is not a 

so request to destroy an existing client window, then a 
block 226 determines whether this event is a re- 
quest to write to the display buffer of an existing 
window. If it is, then a block 227 computes the size 
(%) of the affected area of the window, and adds it 

55 to a G Change Counter for this window. The 

G_Change Counter is a local variable for each 

window that keeps track of the percentage of the 
window affected by draw commands since the last 
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graphics were archived for the window. Following 
this step, a block 228 determines whether 
G_ Change__Counter for this window exceeds a 
G_Jndex_Threshold variable. The 

G lndex_JThreshold variable refers an integer 

which specifies the percentage of the window af- 
fected by draw commands before indexing takes 
place. If G_Change_Counter is not excessive, 
then the application returns to start. However, if it is 
excessive, then a block 229 sends the window a 
pseudo redisplay message, captures the window's 
resulting display requests, and includes them in the 
archived record. Block 229 also includes a list of 
current records-ids for all members of Cli- 
ent rec id vector as a field in the archived 

record. Block 229 then changes the entry for this 

window in Clients rec id to the id of this archived 

record. Next, Block 229 sets 
Text_change_ counter to zero for this window. Fi- 
nally, block 229 includes the create request for this 
window as a field in the archived record. Following 
block 229, the application returns to start. 

If block 226 determines that this event is not a 
request to write to the display buffer of an existing 
window, then a block 230 determines whether this 
event is a request to change a property of an 
existing window. If not, the application returns to 
start. If so, then a decision is made in a block 231 
whether the property specifies a displayed text 
field; e.g., the title bar of the window. If it does not, 
the application returns to start. If the property does 
specify a displayed text field, then a block 232 
stores the named property and its value, and in- 
cludes it as a field when archiving records for this 
window. Following this step, the application returns 
to start. 

A Pseudo-Redisplay request is a request sent 
by the window server or manager to a client asking 
it to redraw itself. In this case, however, the result- 
ing client requests are not actually executed but 
are simply intercepted by the indexing program to 
be part of the archive. This pseudo request is not 
part of the existing window protocol, but could 
easily be integrated into it. The functionality of the 
pseudo-redisplay request during retrieval and 
playback time begins when a user issues a query. 
The system will return zero or more pointers or 
headers of potentially relevant archived records 
depending upon the indexing and retrieval methods 
employed. The user then selects one the these 
records for replay. The retrieved record contains 1) 
the necessary request for generating the window, 
2) all the text that was displayed in it at a certain 
point of time, and 3) a list of Client records in the 
archive that represents the closest prior archived 
state of other windows that were copresent with 
this window at or near the time this record was 
archived. Given the text and the window, it is trivial 



to automatically generate requests that will displays 
the text in the window. To duplicate the environ- 
ment at the time the record was archived, the 
system retrieves the records in this list form the 

5 archive and executes the necessary creation and 
display requests. 

"Animation" of a window, i.e., playing back 
their appearance over a time segment can be ar- 
chived using this algorithm by storing time- 

w stamped lists of display requests with each ar- 
chived record. At playback time, one merely ex- 
ecutes the creation and display requests in the 
chronological ordering determined by the time- 
stamps. 

75 

GENERATING MEDIA REPRESENTATIONS 

Generating media representations 200 differ 
depending upon whether episode audio-data or 

20 episode visual-data is being processed. Generating 
media representations 200 differ depending upon 
whether media representations 202 are derived 
from episode data 104 as opposed to event data 
106 or metadata 108. The plurality of second 

25 knowledge representations contains relationships 
and may contain procedures which relate respec- 
tive ones of episode data 104 to corresponding 
ones of media representations 202, respective ones 
of event data 106 to corresponding ones of media 

30 representations 202, and respective ones of 
metadata 108 to corresponding ones of media re- 
presentations 202. The media representation 202 
corresponding to an event datum 106 may be 
generic for the event or the plurality of second 

35 knowledge representations may establish the icon 
display information about the event based on in- 
formation other than identity of the event. 

REAL TIME GENERATING OF A TABLE OF CON- 
40 TENTS 

The next main step in the process is the gen- 
erating of a table of contents in real time. Windows 
are created in a windows operating environment 
45 according to well-known methods. The windows are 
configured to contain media representations 202 in 
an organized fashion. 

PRESENTATION OF THE TABLE OF CONTENTS 

50 

The next main step in the process is the pre- 
senting the table of contents 400. One possible 
embodiment of the table of contents 302 is pre- 
sented as illustrated in Fig. 5. There is a parent 
55 window 105 which encompasses the entire back- 
ground of the display. On the top left is the primary 
data window 102, on the bottom is a series of 
smaller windows called descendent windows 103, 
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which contain the media representations 202. 

The windows are displayed in the usual man- 
ner using a windowing program 110. Windowing 
programs 110, allow the programmer great flexibil- 
ity in the appearance of the final program. How- 
ever, as the media representations 202 progress 
through a sequence of time, only a limited number 
of media representations 202 can be displayed in 
sibling or descendent windows 103. Thus, the pre- 
senting of the table of contents 400 involves a 
continuous updating of media representations 202 
in the descendent windows 103 or sibling windows 
with the media representations 202 most recently 
created from episode data 104, event data 106, or 
metadata 108. The earlier media representations 
202 may be stored in the first local random access 
memory 16, the second local random access mem- 
ory 50, or the global random access memory 44, 
via the real time indexing program 112, in video 
supplemental random access memory via the win- 
dowing program 110, or in the database 26 via the 
application program 109 or the real time indexing 
program 112. Storage and quick retrieval of these 
media representations 202 is desirable to preserve 
the sequence of occurrences of video images in a 
presentation or video. By default the descendent 
windows 103 will display the most recent media 
representations 202. However, the user may scroll 
through the entire sequence of media representa- 
tions 202 which are stored in the first local random 
access memory 16, the second local random ac- 
cess memory 50, the global random access mem- 
ory 44, or the database 26. 

ANNOTATING WITH ANCILLARY DATA 

The present invention also allows users to an- 
notate episode data by appending ancillary data to 
the episode data. Ancillary data is data which the 
user manually inputs by a keyboard or which is 
semi-automatically input by a user highlighting or 
blocking a segment of preexisting text. Ancillary 
data may constitute notes generated by the user or 
may constitute excerpts of text from documents 
presented in a window during the conference. Re- 
spective ones of ancillary data are associated with 
corresponding ones of episode data 104 to facili- 
tate retrieval. 

To accomplish semi-automatic annotation, the 
means for annotating any episode data 104 encom- 
passes a plurality of third knowledge representa- 
tions to understand the significance of highlighting 
text data in the application program 109 document. 
Alternatively, the application program 109 could be 
modified to send real time notification and informa- 
tion concerning the highlighting of text data to the 
real time indexing program 112. The application 
program 109 may communicate with the real time 



indexing program 112 via standard protocols or via 
the application program protocol. 

The following example explains how ancillary 
data may be used to annotate video data or audio 

5 data. If the user blocks text in a word processing 
program during a conference, then a particular 
event data called event data Z is created. Event 
data Z is associated with episode data 104 that is 
the closest in time to when the blocking occurred. 

70 Referring to Fig. 2, the semi-automatic annotation 
process is represented by the arrow showing data 
flow from block 106 to block 100. The resulting 
appended episode data from the process is useful 
for searching the video data or audio data content 

75 of multiple conferences. 

ALTERNATIVE EMBODIMENT, CLIENT-SERVER 
NETWORK 

20 The alternative embodiment in Rg. 6 illustrates 
the present invention implemented in a client-serv- 
er network and the associated information flow in 
said client server network. 

The alternate embodiment consists of an epi- 

25 sode server 500, a network manager 570, a plural- 
ity of participating clients 600, internal sources 620, 
external sources 650, and an archive 670. 

The episode server 500 controls the network 
manager 570, the input of primary data 102 from 

30 the external sources 650, the storing and retrieval 
of primary data 102 and derivations thereof in the 
archive 670, and the creation-manipulation of win- 
dows on the graphical user interface of participat- 
ing clients 600. 

35 The participating clients 600 use the facilities 

provided by the episode server 500. The participat- 
ing clients 600 are work stations which support a 
graphical user interface. The participating clients 
600 and the episode server 500 communicate via a 

40 network manager 570. 

The network manager 570 may encompass 
network protocols such as TCP/IP, DECnet and 
Chaos. Each participating client 600 may, but is not 
required to, request the episode server 500 to 

45 perform tasks. The network manager 570 will place 
the participating clients 600 in queue until the epi- 
sode server 500 can service the participating cli- 
ent's request. Alternatively, the network manager 
570 may poll the participating clients 600 for data. 

so The network manager 570 provides communica- 
tion, switching and control functions between ones 
of the participating clients 600 and the episode 
server 500; between ones of the external sources 
650 and the episode server 500; between ones of 

55 the internal sources 620 and the episode server 
500; and between ones of the participating clients 
600. The network manager 570 also controls timing 
to prevent data from colliding. 
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The archive 670 is attached to the episode 
server 500. The archive 670 is a high-capacity 
storage disk which stores the index assembly, 
which includes episode data 104, table of contents 
302, media representations 202, ancillary data, and 
other common fifes for all of the participating cli- 
ents 600 in the system. 

Internal sources encompasses sources of 
metadata 108 and event data 106. Thus, internal 
data refers to both metadata 108 and event data 
106. External sources include sources of primary 
data 102 as well as metadata 108 and event data 
106 from other interconnected networks. Thus, ex- 
ternal data encompasses primary data 102 as well 
as metadata 108 from other interconnected net- 
works and event data 106 from other intercon- 
nected networks. 

The client-server network illustrated in Fig. 6 
may be implemented using window software and 
computer architecture which is generally known. 

The episode server 500 performs functions 
which may be divided into two main categories: (1) 
Means for storing data as illustrated in Rg. 7, and 
(2) Means for retrieving data as illustrated in Rg. 8. 

MEANS FOR STORING DATA 

Means for storing data include operating the 
episode server 500 to accomplish the following 
storage functions: delineating episode data 502, 
generating media representations 504, real time 
generating of a table of contents 504, annotating 
episode data 506 and indexing 508 based on the 
ancillary data 530. Delineating episode data 502 is 
based on the input of external data 651 and the 
input of event data 106. After episode data 104 is 
delineated pursuant to the details discussed in the 
preferred embodiment, the output is organized in 
the index assembly process of block 510 in a 
suitable manner for storage in the archive 670. 

Generating media representations 504 and real 
time generating of a table of contents 504 is based 
on the input of client requests 512, event data 106 
and episode data 104. Client requests 512 are a 
request from a user to influence a step or an entire 
process pursuant to a finite number of permissible 
alternatives. The output of the generating media 
representation 504 and real time generating of a 
table of contents 504 is organized in the index 
assembly process of block 510 in a suitable man- 
ner for storage in the archive 670. In addition, the 
output of generating media representations 504 
and real time generating of a table of contents 504 
is routed to the annotating episode data process in 
block 506. 

The annotating process in block 506 accepts 
client notes 514, event data 106, media representa- 
tions 202, and the table of contents 302 as input. 



Client notes 514 are ancillary data 530 which is 
entered by manually typing characters into the 
keyboard or by semi-automatically highlighting a 
textual entry with a pointing device. The annotating 

5 process in block 506 feeds its output into the index 
assembly process 510 which organizes the index 
assembly in proper form for storage in the archive 
670. In addition, the annotating process 506 output 
is routed to the indexing process 508. 

to The indexing process 508 allows media repre- 
sentations 202 to be sorted by ancillary data 530. 
The input of the indexing process are episode data 
104 with appended ancillary data from block 506 
and client requests 512. The output of indexing in 

15 block 508 is fed directly into the archive 670 or is 
fed indirectly into the archive via the index assem- 
bly process in block 510. 

MEANS FOR RETRIEVING DATA 

20 

Means for retrieving data are illustrated in Rg. 
8. Means for retrieving data include operating the 
episode server 500 to retrieve data such that query 
processing 522 and index assembly retrieval and 

25 browsing 524 may occur. Query processing 522 is 
initiated by the input of a client query 516. The 
information being queried can be stored to facilitate 
retrieval by using a variety of known methods such 
as by using inverted files. The information in the 

30 table of contents 302 is able to be queried as well 
as other information contained in the archive 670. 
In response to a user entering a client query 516, 
the episode server 500 will identify one or more 
relevant media representations 202 and display the 

35 relevant media representations 202 for the user on 
the graphical user interface of the participating 
client 600. The user may then make a client selec- 
tion 520 to view the corresponding episode data 
104. 

40 Index assembly retrieval and browsing process 
in block 524 allows the user to select index assem- 
bly output 518 in the form of replays of the primary 
data 102, presentations of media representations 
202, presentations of table of contents 302 dis- 

45 plays, and presentation ancillary data 530 displays 
via client requests 512 and client selections 520 as 
input. The episode server 500 will access the ap- 
propriate record for the user in the archive 670 and 
display the record as output on a participating 

so client's graphical user interface. Moreover, the epi- 
sode server 500 distinguishes windowing protocols 
that describe window configuration (i.e. geometry, 
background, color) from the windowing messages 
that describe the contents of the window such that 

55 episode data 104 may be replayed independently 
via client requests 512 and client selection 520 as 
input to the episode server 500. 



10 



19 



EP 0 660 249 A1 



20 



Obviously, numerous modifications and vari- 
ations of the present invention are possible in light 
of the above teachings. It is, therefore, understood 
that within the scope of the appended claims, the 
invention may be practiced otherwise than as spe- 
cifically described herein. 

Claims 

1. A method of operating a data processor to 
index data for at least one user, comprising: 

a) receiving primary data from a data 
source, the primary data consisting of a 
sequence of words; 

b) delineating a plurality of episode data 
from the primary data, each episode data 
being at least one word of said sequence of 
words, each episode data being stored in 
memory; 

c) generating a plurality of media repre- 
sentations, to correspond with said episode 
data; 

d) generating of a table of contents in real 
time with said plurality of episode data and 
said plurality of media representations; and 

e) presenting the table of contents to said at 
least one user on at least one display de- 
vice, the table of contents permitting the 
user to select respective ones of the plural- 
ity of media representations to replay cor- 
responding ones of the plurality of episode 
data. 

2. A method of operating a data processor to 
index data according to claim 1 wherein said 
sequence of words in step a) has a particular 
word, the particular word being the primary 
data received during a discrete, identifiable 
time interval; and 

the steps b), c), and e) are executed im- 
mediately after the particular word is received 
in step a) such that the media representation 
derived from the particular word is presented 
to said at least one user immediately after the 
particular word is received. 

3. A method of operating a data processor to 
index data according to claim 1 wherein the 
plurality of media representations is generated 
in step c) by executing procedures in coopera- 
tion with a plurality of second knowledge re- 
presentations and wherein the plurality of epi- 
sode data is delineated in step b) by executing 
procedures in conjunction with a plurality of 
first knowledge representations. 

4. A method of operating a data processor to 
index data according to claim 1 further com- 



prising the step of: 

f) storing the plurality of episode data and 
the plurality of media representations in a 
database. 

5 

5. A method of operating a data processor to 
index data according to claim 1 further com- 
prising the steps of: 

f) allowing a user to retrieve selective ones 
10 of said episode data by selecting corre- 
sponding ones of the media representa- 
tions; 

g) allowing a user to input ancillary data to 
annotate the selected episode data that the 

75 user retrieved in step f); 

h) presenting the ancillary data to a user; 

i) storing the ancillary data such that the 
ancillary data is appended to the particular 
selected episode data that was retrieved in 

20 step f); 

j) indexing the ancillary data such that each 

ancillary data has a first ancillary address 

stored in the database; and 

k) retrieving the ancillary data such that the 
25 ancillary data is replayed for said user. 

6. A method of operating a data processor to 
index data according to claim 5 further com- 
prising the steps of: 

30 I) sorting the ancillary data by using input 

values selected by a user and by using the 
intrinsic values of the ancillary data; 
m) generating a second ancillary address 
corresponding to the ancillary data that was 

35 sorted according to step a), the second 

ancillary address being stored in the 
database; and 

n) displaying the ancillary data that was 
sorted according to step a) on a display 
40 device by using the second ancillary ad- 

dress. 

7. A method of operating a data processor to 
index data according to claim 6 further com- 

45 prising the steps of: 

o) sorting the episode data by sorting re- 
spective ones of ancillary data that were 
appended to the corresponding ones of said 
plurality of episode data in claim 6. 

so 

8. A method of operating a data processor to 
index data according to claim 1 wherein the 
sequence of words relates to a given occur- 
rence selected from a plurality of occurrences 

55 by the user, respective ones of said episode 

data being associated with corresponding ones 
of said occurrences; and 

further comprising the step of permitting 
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the user to select a presentation of corre- 
sponding ones of said occurrences by select- 
ing respective ones of said episode data. 

9. A method of operating a data processor to 
index data according to claim 1 wherein the 
data source is selected from the group consist- 
ing of an audio data source, a visual data 
source, an audiovisual data source, a text data 
source, a user input data source, and combina- 
tion thereof. 

10. A data processor system for indexing data 
comprising: 

a) a receiver, the receiver capable of de- 
modulating a primary data signal to obtain 
primary data, the receiver having a memory 
buffer selected from the group of first input 
memory buffers and second input memory 
buffers to store the primary data in digital 
form, the primary data of at least one se- 
quence of words; 

b) a plurality of episode data, each episode 
data being at least one word of the se- 
quence of words; 

c) a plurality of first knowledge representa- 
tions containing that relate the plurality of 
episode data to the primary data; 

d) a plurality of second knowledge repre- 
sentations that relate respective ones of 
said plurality of media representations with 
corresponding ones of said episode data; 

e) a processor selected from the group of 
first processor, second processor, and first 
and second processor considered collec- 
tively to generate each media representa- 
tion from any given episode data, from the 
plurality of episode data, based on said 
second knowledge representations, and to 
generate each episode data from any given 
primary data based on said first knowledge 
representations; and 

f) a means for real time generating of a 
table of contents from the plurality of epi- 
sode data and from the media representa- 
tions. 

11. The system according to claim 10 further com- 
prising means for presenting primary data to a 
user before the primary data is stored. 

12. The system according to claim 10 further com- 
prising means for generating a media repre- 
sentation and means for associating respective 
ones of said plurality of episode data with 
corresponding ones of said media representa- 
tions. 



13. The system according to claim 10 further com- 
prising means for storing a table of contents of 
the plurality of episode data and media repre- 
sentations. 

5 

14. A multimedia conferencing system comprising: 

a) a plurality of participating clients, the 
participating client being work stations, the 
work stations supporting a graphical user 

w interface; 

b) external sources, the external sources 
having external data, external data including 
primary data, and metadata and event data 
from other networks; 

75 c) internal sources, the internal sources en- 

compassing local sources of metadata and 
event data; 

d) an episode server, the episode server 
having a means for storing data, said means 

20 for storing data responsive to the input of 

external data, event data, client request or 
client notes, the episode server having a 
means for retrieving data, said means for 
retrieving data being responsive to the input 

25 of a client query, client selection or client 

request, said means for storing data includ- 
ing delineating episode data by executing 
procedures in conjunction with a plurality of 
first knowledge representations; 

30 e) a network manager, the network manager 

being operably, electrically connected to 
said episode server, said external sources 
and said participating clients, the network 
manager having a means for providing com- 

35 munication, switching and control functions 

between ones of said participating clients 
and the episode server, between ones of 
the participating clients, between the exter- 
nal sources and the episode server, and 

40 between the internal sources and the epi- 

sode server; and 

f) an archive, the archive operably attached 
to the episode server, the archive being a 
high-capacity storage device for storing the 
45 episode data, and for storing other common 

files for the participating clients. 

15. The multimedia conferencing system of claim 
14 wherein the plurality of first knowledge re- 

50 presentations contain knowledge concerning 

the luminance values of bitmap, the correlation 
of simultaneous changes in audio and video 
and the color of bitmaps. 
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